AITopics | recursive formula

Non-Cooperative Inverse Reinforcement Learning

Neural Information Processing SystemsDec-25-2025, 09:53:11 GMT

Making decisions in the presence of a strategic opponent requires one to take into account the opponent's ability to actively mask its intended objective. To describe such strategic situations, we introduce the non-cooperative inverse reinforcement learning (N-CIRL) formalism. The N-CIRL formalism consists of two agents with completely misaligned objectives, where only one of the agents knows the true objective function.

formalism, non-cooperative inverse reinforcement learning, recursive formula, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

From Theory to Practice with RAVEN-UCB: Addressing Non-Stationarity in Multi-Armed Bandits through Variance Adaptation

Fang, Junyi, Chen, Yuxun, Chen, Yuxin, Zhang, Chen

arXiv.org Machine LearningJun-4-2025

The Multi-Armed Bandit (MAB) problem is challenging in non-stationary environments where reward distributions evolve dynamically. We introduce RAVEN-UCB, a novel algorithm that combines theoretical rigor with practical efficiency via variance-aware adaptation. It achieves tighter regret bounds than UCB1 and UCB-V, with gap-dependent regret of order $K σ_{\max}^2 \log T / Δ$ and gap-independent regret of order $\sqrt{K T \log T}$. RAVEN-UCB incorporates three innovations: (1) variance-driven exploration using $\sqrt{\hatσ_k^2 / (N_k + 1)}$ in confidence bounds, (2) adaptive control via $α_t = α_0 / \log(t + ε)$, and (3) constant-time recursive updates for efficiency. Experiments across non-stationary patterns - distributional changes, periodic shifts, and temporary fluctuations - in synthetic and logistics scenarios demonstrate its superiority over state-of-the-art baselines, confirming theoretical and practical robustness.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2506.02933

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Non-Cooperative Inverse Reinforcement Learning

Neural Information Processing SystemsOct-10-2024, 01:35:31 GMT

Making decisions in the presence of a strategic opponent requires one to take into account the opponent's ability to actively mask its intended objective. To describe such strategic situations, we introduce the non-cooperative inverse reinforcement learning (N-CIRL) formalism. The N-CIRL formalism consists of two agents with completely misaligned objectives, where only one of the agents knows the true objective function. As a result of the one-sided incomplete information, the multi-stage game can be decomposed into a sequence of single- stage games expressed by a recursive formula. Solving this recursive formula yields the value of the N-CIRL game and the more informed player's equilibrium strategy.

formalism, non-cooperative inverse reinforcement learning, recursive formula, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Untangling tradeoffs between recurrence and self-attention in neural networks

Kerg, Giancarlo, Kanuparthi, Bhargav, Goyal, Anirudh, Goyette, Kyle, Bengio, Yoshua, Lajoie, Guillaume

arXiv.org Machine LearningJun-16-2020

Attention and self-attention mechanisms, inspired by cognitive processes, are now central to state-of-the-art deep learning on sequential tasks. However, most recent progress hinges on heuristic approaches with limited understanding of attention's role in model optimization and computation, and rely on considerable memory and computational resources that scale poorly. In this work, we present a formal analysis of how self-attention affects gradient propagation in recurrent networks, and prove that it mitigates the problem of vanishing gradients when trying to capture long-term dependencies. Building on these results, we propose a relevancy screening mechanism, inspired by the cognitive process of memory consolidation, that allows for a scalable use of sparse self-attention with recurrence. While providing guarantees to avoid vanishing gradients, we use simple numerical experiments to demonstrate the tradeoffs in performance and computational resources by efficiently balancing attention and recurrence. Based on our results, we propose a concrete direction of research to improve scalability of attentive networks.

gradient propagation, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2006.09471

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Propagation Graph Estimation by Pairwise Alignment of Time Series Observation Sequences

Hayashi, Tatsuya, Nakamura, Atsuyoshi

arXiv.org Artificial IntelligenceMay-11-2020

Various things propagate through the medium of individuals. Some biological cells fire right after the firing of their neighbor cells, and such firing propagates from cells to cells. In this paper, we study the problem of estimating the firing propagation order of cells from the $\{0,1 \}$-state sequences of all the cells, where '1' at the $i$-th position means the firing state of the cell at time step $i$. We propose a method to estimate the propagation direction between cells by the sum of one cell's time delay of the matched positions from the other cell averaged over the minimum cost alignments and show how to calculate it efficiently. The propagation order estimated by our proposed method is demonstrated to be correct for our synthetic datasets, and also to be consistent with visually recognizable firing order for the dataset of soil-dwelling amoeba's chemical signal emitting state sequences.

alignment, artificial intelligence, minimum cost alignment, (15 more...)

arXiv.org Artificial Intelligence

2005.04954

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Japan > Hokkaidō (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback

Non-Cooperative Inverse Reinforcement Learning

Zhang, Xiangyuan, Zhang, Kaiqing, Miehling, Erik, Basar, Tamer

Neural Information Processing SystemsMar-19-2020, 00:30:42 GMT

Making decisions in the presence of a strategic opponent requires one to take into account the opponent's ability to actively mask its intended objective. To describe such strategic situations, we introduce the non-cooperative inverse reinforcement learning (N-CIRL) formalism. The N-CIRL formalism consists of two agents with completely misaligned objectives, where only one of the agents knows the true objective function. As a result of the one-sided incomplete information, the multi-stage game can be decomposed into a sequence of single- stage games expressed by a recursive formula. Solving this recursive formula yields the value of the N-CIRL game and the more informed player's equilibrium strategy.

formalism, non-cooperative inverse reinforcement learning, recursive formula, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Fast Greedy Algorithm for Generalized Column Subset Selection

Farahat, Ahmed K., Ghodsi, Ali, Kamel, Mohamed S.

arXiv.org Machine LearningDec-24-2013

This paper defines a generalized column subset selection problem which is concerned with the selection of a few columns from a source matrix A that best approximate the span of a target matrix B. The paper then proposes a fast greedy algorithm for solving this problem and draws connections to different problems that can be efficiently solved using the proposed algorithm.

algorithm, artificial intelligence, machine learning, (12 more...)

arXiv.org Machine Learning

1312.682

Country: North America > Canada > Ontario (0.14)

Genre: Research Report (0.50)

Technology: